Network switch using network processor and methods

ABSTRACT

A network switch apparatus, components for such an apparatus, and methods of operating such an apparatus in which data flow handling and flexibility is enhanced by the cooperation among a plurality of interface processors and a suite of peripheral elements formed on a semiconductor substrate. The interface processors and peripherals together form a network processor capable of cooperating with other elements including an optional switching fabric device in executing instructions directing the flow of data in a network.

RELATED APPLICATIONS

[0001] The interested reader is referred, for assistance inunderstanding the inventions here described, to the following priordisclosures which are relevant to the description which follows and eachof which is hereby incorporated by reference into this description asfully as if here repeated in full:

[0002] U.S. Pat. No. 5,008,878 issued Apr. 16, 1991 for High SpeedModular Switching Apparatus for Circuit and Packet Switched Traffic;

[0003] U.S. Pat. No. 5,724,348 issued Mar. 3, 1998 for EfficientHardware/Software Interface for a Data Switch;

[0004] U.S. Pat. No. 5,787,430, issued Jul. 28, 1998 for Variable LengthData Sequence Back Tracking and Tree Structure;

[0005] U.S. patent application Ser. No. 09/312,148 filed May 14, 1999,and entitled “System Method and Computer Program for Filtering UsingTree Structure”; and

[0006] U.S. patent application Ser. No. 09/330,968 filed Jun. 11, 1999and entitled “High Speed Parallel/Serial Link for Data Communication”.

BACKGROUND OF THE INVENTION

[0007] This invention relates to communication network apparatus such asis used to link together information handling systems or computers ofvarious types and capabilities and to components of such apparatus. Inparticular, this invention relates to scalable switch apparatus andcomponents useful in assembling such apparatus. This invention relatesto an improved and multi-functional interface device and the combinationof that device with other elements to provide a media speed networkswitch. The invention also relates to methods of operating suchapparatus which improve the data flow handling capability of networkswitches.

[0008] The description which follows presupposes knowledge of networkdata communications and switches and routers as used in suchcommunications networks. In particular, the description presupposesfamiliarity with the ISO model of network architecture which dividesnetwork operation into layers. A typical architecture based upon the ISOmodel extends from Layer 1 (also sometimes identified as “L1”) being thephysical pathway or media through which signals are passed upwardsthrough Layers 2, 3, 4 and so forth to Layer 7, the last mentioned beingthe layer of applications programming running on a computer systemlinked to the network. In this document, mention of L1, L2 and so forthis intended to refer to the corresponding layer of a networkarchitecture. The disclosure also presupposes a fundamentalunderstanding of bit strings known as packets and frames in such networkcommunication.

[0009] In today's networked world, bandwidth is a critical resource.Increasing network traffic, driven by the Internet and other emergingapplications, is straining the capacity of network infrastructures. Tokeep pace, organizations are looking for better technologies andmethodologies to support and manage traffic growth and the convergenceof voice with data.

[0010] Today's dramatic increase in network traffic can be attributed tothe popularity of the Internet, a growing need for remote access toinformation, and emerging applications. The Internet alone, with itsexplosive growth in e-commerce, has placed a sometimes insupportableload on network backbones. It is also the single most important cause ofincreased data traffic volumes that exceed voice traffic for the firsttime. The growing demands of remote access applications, includinge-mail, database access, and file transfer, are further strainingnetworks.

[0011] The convergence of voice and data will play a large role indefining tomorrow's network environment. Currently, the transmission ofdata over Internet protocol (IP) networks is free. Because voicecommunications will naturally follow the path of lowest cost, voice willinevitably converge with data. Technologies such as Voice over IP(VoIP), Voice over ATM (VoATM), and Voice over Frame Relay (VoFR) arecost-effective alternatives in this changing market. However, to makemigration to these technologies possible, the industry has to ensurequality of service (QoS) for voice and determine how to charge for voicetransfer over data lines. The Telecommunications Deregulation Act of1996 further complicates this environment. This legislation willreinforce a symbiotic relationship between the voice protocol of choice,ATM, and the data protocol of choice, IP.

[0012] Integrating legacy systems is also a crucial concern fororganizations as new products and capabilities become available. Topreserve their investments in existing equipment and software,organizations demand solutions that allow them to migrate to newtechnologies without disrupting their current operations.

[0013] Eliminating network bottlenecks continues to be a top priorityfor service providers. Routers are often the source of thesebottlenecks. However, network congestion in general is oftenmisdiagnosed as a bandwidth problem and is addressed by seekinghigher-bandwidth solutions. Today, manufacturers are recognizing thisdifficulty. They are turning to network processor technologies to managebandwidth resources more efficiently and to provide the advanced dataservices, at wire speed, that are commonly found in routers and networkapplication servers. These services include load balancing, QoS,gateways, fire walls, security, and web caching.

[0014] For remote access applications, performance, bandwidth-on-demand,security, and authentication rank as top priorities. The demand forintegration of QoS and CoS, integrated voice handling, and moresophisticated security solutions will also shape the designs of futureremote access network switches. Further, remote access will have toaccommodate an increasing number of physical mediums, such as ISDN, T1,E1, OC-3 through OC-48, cable, and xDSL modems.

[0015] Industry consultants have defined a network processor (hereinalso mentioned as an “NP”) as a programmable communications integratedcircuit capable of performing one or more of the following functions:

[0016] Packet classification—identifying a packet based on knowncharacteristics, such as address or protocol

[0017] Packet modification—modifying the packet to comply with IP, ATM,or other protocols (for example, updating the time-to-live field in theheader for IP)

[0018] Queue/policy management—reflects the design strategy for packetqueuing, de-queuing, and scheduling of packets for specific applications

[0019] Packet forwarding—transmission and receipt of data over theswitch fabric and forwarding or routing the packet to the appropriateaddress

[0020] Although this definition is an accurate description of the basicfeatures of early NPs, the full potential capabilities and benefits ofNPs are yet to be realized. Network processors can increase bandwidthand solve latency problems in a broad range of applications by allowingnetworking tasks previously handled in software to be executed inhardware. In addition, NPs can provide speed improvements througharchitectures, such as parallel distributed processing and pipelineprocessing designs. These capabilities can enable efficient searchengines, increase throughput, and provide rapid execution of complextasks.

[0021] Network processors are expected to become the fundamental networkbuilding block for networks in the same fashion that CPUs are for PCs.Typical capabilities offered by an NP are real-time processing,security, store and forward, switch fabric, and IP packet handling andlearning capabilities. NPs target ISO layer two through five and aredesigned to optimize network-specific tasks.

[0022] The processor-model NP incorporates multiple general purposeprocessors and specialized logic. Suppliers are turning to this designto provide scalable, flexible solutions that can accommodate change in atimely and cost-effective fashion. A processor-model NP allowsdistributed processing at lower levels of integration, providing higherthroughput, flexibility and control. Programmability can enable easymigration to new protocols and technologies, without requiring new ASICdesigns. With processor-model NPs, NEVs benefit from reducednon-refundable engineering costs and improved time-to-market.

BRIEF SUMMARY OF THE INVENTION

[0023] One purpose of this invention is to provide a scalable switcharchitecture for use in a data communication network which is capable ofsizing support capabilities to a range of potential demands whileimproving the speed of handling of data being transferred. This purposeis pursued by providing components, and assemblages of components, whichremove from the workload of processing units involved a greater amountof data handling than has been the case heretofore.

[0024] Another purpose is to provide an interface device or networkprocessor (the terms being used interchangeably) which includes aplurality of sub-assemblies integrated on a single substrate andcoacting to provide media rate switching of frames that include layer 2,layer 3, layer 4 and layer 5. The interface device may be used as astandalone solution providing a first level of capability for a workgroup switch, an interconnected solution providing a higher level ofcapability work group switch or scaled further upward in capability bycooperation with a switching fabric device.

BRIEF DESCRIPTION OF THE DRAWINGS

[0025] Some of the purposes of the invention having been stated, otherswill appear as the description proceeds, when taken in connection withthe accompanying drawings, in which:

[0026]FIG. 1 shows a block diagram for an interface device in accordancewith this invention.

[0027]FIG. 1A shows a block diagram for the MAC.

[0028]FIGS. 2A through 2D show the interface device interconnected withother components in different system configurations.

[0029]FIG. 3 shows the flow and processing of an encapsulated guidedframe.

[0030]FIG. 4 shows the flow and processing of an internal guided frame.

[0031]FIG. 5 shows generalized format for a Guided Cell.

[0032]FIG. 6 shows the format for Frame Control Information.

[0033]FIG. 7 shows the format for the Correlator.

[0034]FIG. 8 shows Command Control Information Format.

[0035]FIG. 9 shows Addressing Information Format.

[0036]FIG. 10 shows General Form of Structure Addressing.

[0037]FIG. 11 shows chart for Addressing, Island Encoding.

[0038]FIG. 12A shows a block diagram of the Embedded Processor Complex.

[0039]FIG. 12B shows a schematic of the Embedded Processors.

[0040]FIG. 12C shows a structure for a GxH Processor.

[0041]FIG. 13 shows a block diagram of the memory complex.

[0042]FIG. 14 shows a flowchart for the Fixed Match(FM) searchalgorithm.

[0043]FIG. 15 shows flows illustrating Data Structure without using aDirect Table and with using a Direct Table.

[0044]FIG. 16 shows a block diagram of a switching systems such asPrizma.

[0045]FIG. 17 shows a block diagram of a CP.

[0046]FIG. 18 shows a block diagram of the single chip Network Processorhighlighting function in the EDS-UP, EDS DOWN and the EPC.

DESCRIPTION OF THE PREFERRED EMBODIMENT(S)

[0047] While the present inventions will be described more fullyhereinafter with reference to the accompanying drawings, in whichpreferred embodiments of the present inventions are shown, it is to beunderstood at the outset of the description which follows that personsof skill in the appropriate arts may modify the inventions heredescribed while still achieving the favorable results of the inventions.Accordingly, the description which follows is to be understood as beinga broad, teaching disclosure directed to persons of skill in theappropriate arts, and not as limiting upon the present inventions.

[0048] Apparatus disclosed here is scalable and capable of functioningto interconnect desktop or workgroup switches, aggregate such switchesinto a network backbone, and provide backbone switching services. Theapparatus can support Layer 2, Layer 3, and Layer 4+ forwarding inhardware. Certain forms of the apparatus are designed for desktop orworkgroup switch aggregation and while others are targeted as corebackbone switches.

[0049] The architecture used for the apparatus is based on an interfacedevice or network processor hardware subsystem and a software libraryrunning on a control point, all as more fully described elsewhere inthis document. The interface device or network processor subsystem is ahigh performance frame forwarding engine designed for parsing andtranslation of L2, L3, and L4+ protocol headers. This allows protocolsto be switched at greater speeds using hardware. The interface device ornetwork processor subsystem provides a fast-path through the box whilethe software library and control point processor provide management androute discovery functions needed to maintain the fast-path. The controlpoint processor and the software library running thereon together definethe Control Point (CP) of the system. The CP is where the actualbridging and routing protocols such as Transparent Bridging and OSPF arerun. It can also be referred to as the slow-path of the system.

[0050] While the apparatus here disclosed supports multi-layerforwarding in hardware it can also operate as a L2 only switch and thatis its default mode of operation in the simplest form disclosed. Eachport will be put into a single domain allowing any device to communicatewith any other device. The apparatus is configurable at L2 allowingsystem administrators the ability to configure features such as;grouping ports into separate domains or trunks, configuring Virtual LAN(VLAN) segments, or filters to control broadcast and multicast traffic.

[0051] This scalable apparatus has many benefits. First, it allows thesystem administrator the ability to configure L3 forwarding and routingof IP and IPX traffic using the same hardware being used for L2 and atthe same speed. Second, it removes the need for using external routersto interconnect campus buildings while increasing performance at thesame time. Third, it simplifies or combines the management of L2/L3service for a building into a single point of control. Finally, itprovides value added features with L4+ functions that allow systemadministrators the ability to assign different traffic classificationsto support mission critical applications and network dispatcher forload-balancing among servers.

[0052] The apparatus is designed to be a modular unit using an interfacedevice or network processor, a Control Point (CP), and an optionalswitching fabric device as its fundamental building blocks. Theinterface device preferably provides L2/L3/L4+ fast-path forwardingservices while the CP provides the management and route discoveryfunctions needed to maintain the fast-path. The optional switchingfabric device is used when more than two interface device subsystems aretied together. The optional switching fabric device may be as disclosedin U.S. Pat. No. 5,008,878 issued Apr. 16, 1991 for High Speed ModularSwitching Apparatus for Circuit and Packet Switched Traffic mentionedhereinabove and incorporated herein by reference.

[0053] The apparatus is anticipated to be assembled using printedcircuit board elements also here mentioned as “blades”. The printedcircuit board elements have circuit elements mounted thereon and arereceived in connectors provided in apparatus housings. Similar devicesare also know as “option cards”. The apparatus contemplates that bladescan be exchanged among varying chassis or housings, provided thatappropriate connectors and backplane electrical connections areprovided. The basic component found on all blades is a carriersubsystem. Starting with the carrier subsystem, three types of bladescan be produced. The first type is a CP only Blade, which consists of acarrier subsystem and a CP subsystem. The primary use of a CP only bladeis for a product where redundancy is the primary concern. The secondtype is a CP+Media Blade, which consists of a carrier subsystem, a CPsubsystem, and 1-to-3 media subsystems. The primary use of a CP+Mediablade is a product where port density is deemed more important thanredundancy. The third type is a Media Blade, which consists of a carriersubsystem and 1-to-4 media subsystems. The media blades can be used inany chassis and the type of media subsystem used is configurable.

[0054] Blade management will involve fault detection, power management,new device detection, initialization, and configuration. This managementwill be done using various registers, I/O signals, and a guided cellinterface that is used to communicate between the CP and carriersubsystems. However, unlike the chassis there does exist programmabledevices and memory on all blades. The amount of programmability dependson the type of blade. When the CP subsystem exists on a blade both theCP and carrier subsystems are programmable. The media subsystems arealso programmable but only indirectly through the carrier subsystem.

[0055] In higher capability products there also exists a Switch Bladewhich contains the switching fabric-device subsystem. The management ofthis blade will involve fault detection, power management, new devicedetection, and initialization. This management will be done usingvarious registers and I/O signals that will be mapped into the CPsubsystem.

[0056] In its simplest form, a switch apparatus contemplated by thisinvention has a control point processor; and an interface deviceoperatively connected to the control point processor. Preferably and ashere disclosed, the interface device (also known as a network processor)is a unitary Very Large Scale Integrated (VLSI) circuit device or chipwhich has a semiconductor substrate; a plurality of interface processorsformed on the substrate; internal instruction memory formed on saidsubstrate and storing instructions accessibly to the interfaceprocessors; internal data memory formed on the substrate and storingdata passing through the device accessibly to the interface processors;and a plurality of input/output ports. The interface processors are alsosometimes herein identified as picoprocessors or processing units. Theports provided include at least one ports connecting the internal datamemory with external data memory and at least two other ports exchangingdata passing through the interface device with an external network underthe direction of the interface processors. The control point cooperateswith the interface device by loading into the instruction memoryinstructions to be executed by the interface processors in directing theexchange of data between the data exchange input/output ports and theflow of data through the data memory.

[0057] The network processor here disclosed is deemed inventive apartfrom the switch assemblies into which it is incorporated. Further, thenetwork processor here disclosed is deemed to have within its elementshere described other and further inventions not here fully discussed.

[0058]FIG. 1 shows a block diagram for the interface device chip thatincludes substrate 10 and a plurality of sub-assemblies integrated onthe substrate. The sub-assemblies are arranged into an Upsideconfiguration and a Downside configuration. As used herein, “Upside”refers to data flows inbound from a network to the apparatus heredisclosed, while “Downside” refers to data outbound from the apparatusto a network serviced by the apparatus. The data flow follows therespective configurations. As a consequence, there is an Upside dataflow and a Downside data flow. The sub-assemblies in the Upside includeEnqueue-Dequeue-Scheduling UP (EDS-UP) logic 16, multiplexed MAC's-UP(PPM-UP) 14, Switch Data Mover-UP (SDM-UP) 18, System Interface (SIF)20, Data Align Serial Link A (DASLA) 22,and Data Align Serial Link B(DASLB) 24. A data align serial link is more fully described incopending U.S. patent application Ser. No. 09/330,968 filed Jun. 11,1999 and entitled “High Speed Parallel/Serial Link for DataCommunication” mentioned hereinabove and incorporated by referencehereinto. While the preferred form of the apparatus of this inventionhere disclosed uses a DASL link, the present invention contemplates thatother forms of links may be employed to achieve relatively high dataflow rates, particularly where the data flows are restricted to beingwithin the VLSI structure.

[0059] The sub-assemblies in the downside include DASL-A 26, DASL-B 28,SIF 30, SDM-DN 32, EDS-DN 34, and PPM-DN 36. The chip also includes aplurality of internal S-RAM's, Traffic Mgt Scheduler 40, and EmbeddedProcessor Complex (EPC) 12. An interface device 38 is coupled byrespective DMU Busses to PMM 14 and 36. The interface 38 could be anysuitable L1 circuitry, such as ethernet Physical (ENET PHY), ATM Framer,etc. The type of interface is dictated in part by the network media towhich the chip is connected. A plurality of external D-RAM's and S-RAMare available for use by the chip.

[0060] While here particularly disclosed for networks in which thegeneral data flow outside the relevant switching and routing devices ispassed through electrical conductors such as wires and cables installedin buildings, the present invention contemplates that the networkswitches and components thereof here disclosed may be used in a wirelessenvironment as well. By way of an illustrative example, the media accesscontrol (MAC) elements here described may be replaced by suitable radiofrequency elements, possibly using known Silicon Germanium technology,which would result in a capability to link the elements here describeddirectly to a wireless network. Where such technology is appropriatelyemployed, the radio frequency elements can, by person of appropriateskill in the applicable arts, be integrated into the VLSI structureshere disclosed. Alternatively, radio frequency or otherwise wirelessresponse devices such as infrared responsive devices can be mounted on ablade with other elements here disclosed to achieve a switch apparatususeful with wireless network systems.

[0061] The arrows show the general flow of data within the Interfacedevice. Frames received from an Ethernet MAC are placed in internal DataStore buffers by the EDS-UP. These frames are identified as eithernormal Data Frames or system control Guided Frames and enqueued to theEPC (FIG. 1). The EPC contains N protocol processors capable of workingon up to N frames in parallel (N>1). In an embodiment of ten protocolprocessor (FIG. 12B), two of the ten protocol processors arespecialized; one for handling Guided Frames (the Generic Central Handleror GCH) and one for building Lookup Data in Control Memory (the GenericTree Handler or GTH). As shown in FIG. 12A, the EPC also contains adispatcher which matches new frames with idle processors, a completionunit which maintains frame sequence, a Common Instruction memory sharedby all ten processors, a Classifier Hardware Assist which determinesframe classification and coprocessor which helps determine the startinginstruction address of the frame, Ingress and Egress Data Storeinterfaces which control read and write operations of frame buffers, aControl Memory Arbiter which allows the ten processors to share ControlMemory, a Web Control, Arbiter and interface that allows debug access tointernal Interface device data structures, as well as other hardwareconstructs.

[0062] Guided Frames are sent by the dispatcher to the GCH processor asit becomes available. Operations encoded in the Guided Frame areexecuted, such as register writes, counter reads, Ethernet MACconfiguration changes, and so on. Lookup table alterations, such asadding MAC or IP entries, are passed on to the Lookup Data processor forControl Memory operations, such as memory reads and writes. Somecommands, such as MIB counter reads, require a response frame to bebuilt and forwarded to the appropriate port on the appropriate Interfacedevice. In some cases, the Guided Frame is encoded for the Egress sideof Interface device. These frames are forwarded to the Egress side ofthe Interface device being queried, which then executes the encodedoperations and builds any appropriate response frame.

[0063] Data frames are dispatched to the next available protocolprocessor for performing frame lookups. Frame data are passed to theprotocol processor along with results from the Classifier HardwareAssist (CHA) Engine. The CHA parses IP or IPX. The results determine theTree Search algorithm and starting Common Instruction Address (CIA).Tree Search algorithms supported included Fixed Match Trees (fixed sizepatterns requiring exact match, such as Layer 2 Ethernet MAC tables),Longest prefix Match Trees (variable length patterns requiring variablelength matches, such as subnet IP forwarding) and Software Managed Trees(two patterns defining either a range or a bit mask set, such as usedfor filter rules).

[0064] Lookup is performed with the aid of the Tree Search Engine (TSE)Coprocessor, which is a part of each protocol processor. The TSECoprocessor performs Control memory accesses, freeing the protocolprocessor to continue execution. Control memory stores all tables,counters, and other data needed by the picocode. Control memoryoperations are managed by the Control memory Arbiter, which arbitratesmemory access among the ten processor complexes.

[0065] Frame data are accessed through the Data Store Coprocessor. TheData Store Coprocessor contains a primary data buffer (holding up toeight 16 byte segments of frame data), a scratch pad data buffer (alsoholding up to eight 16-byte segments of frame data) and some controlregisters for Data Store operations. Once a match is found, Ingressframe alterations may include a VLAN header insertion or overlay. Thisalteration is not performed by the interface device processor complex,but rather hardware flags are derived and other Ingress Switch Interfacehardware performs the alterations. Other frame alterations canbe-accomplished by the picocode and the Data Store Coprocessor bymodifying the frame contents held in the Ingress Data Store.

[0066] Other data are gathered and used to build Switch Headers andFrame Headers prior to sending frames to the switch fabric device.Control data include switch information, such as the destination bladeof the frame, as well as information for the Egress Interface device,helping it expedite frame lookup of destination ports, multicast orunicast operations, and Egress Frame alterations.

[0067] Upon completion, the Enqueue Coprocessor builds the necessaryformats for enqueuing the frame to the switch fabric and sends them tothe Completion Unit. The Completion Unit guarantees frame order from theten protocol processors to the switch fabric queues. Frames from theswitch fabric queues are segmented into 64 byte cells with Frame Headerbytes and Switch Header bytes inserted as they are transmitted to thePrizma-E Switch.

[0068] Frames received from the switch fabric are placed in Egress DataStore (Egress DS) buffers by an Egress EDS (34) and enqueued to the EPC.A portion of the frame is sent by the dispatcher to an idle protocolprocessor for performing frame lookups. Frame data are dispatched to theprotocol processor along with data from the Classifier Hardware Assist.The Classifier Hardware Assist uses frame control data created by theIngress Interface device to help determine the beginning CodeInstruction Address (CIA).

[0069] Egress Tree Searches support the same algorithms as supported forIngress Searches. Lookup is performed with the TSE Coprocessor, freeingthe protocol processor to continue execution. All Control memoryoperations are managed by the Control memory Arbiter, which allocatesmemory access among the ten processor complexes.

[0070] Egress frame data are accessed through the Data StoreCoprocessor. The Data Store Coprocessor contains a primary data buffer(holding up to eight 16-byte segments of frame data), a scratch pad databuffer (also holding up to eight 16-byte segments of frame data) andsome control registers for Data Store operations. The result of asuccessful lookup contains forwarding information and, in some cases,frame alteration information. Frame alterations can include VLAN headerdeletion, Time to Live increment (IPX) or decrement (IP), IP HeaderChecksum recalculation, Ethernet frame CRC overlay or insertion and MACDA/SA overlay or insertion. IP Header checksums are prepared by theChecksum Coprocessor. Alterations are not performed by the Interfacedevice Processor Complex, but rather hardware flags are created and PMMEgress hardware performs the alterations. Upon completion, the EnqueueCoprocessor is sued to help build the necessary formats for enqueuingthe frame in the EDS Egress queues and sending them to the CompletionUnit. The Completion Unit guarantees frame order from the ten protocolprocessors to the EDS Egress queues feeding the egress Ethernet MACs 36.

[0071] The completed frames are finally sent by PMM Egress hardware tothe Ethernet MACs and out the Ethernet ports.

[0072] An internal bus, referred to as the Web, allows access tointernal registers, counters and memory. The Web also includes anexternal interface to control instruction step and interrupt control fordebugging and diagnostics.

[0073] Tree Search Engine coprocessor provides memory range checking,illegal memory access notification and performs tree search instructions(such as memory read, write or read-add-write) operating in parallelwith protocol processor execution.

[0074] Common Instruction Memory consists of one 1024×128 RAM and twosets of Dual 512×128 RAM. Each set of Dual RAMs provides two copies ofthe same picocode, allowing processors independent access toinstructions within the same address range. Each 128-bit word includesfour 32-bit instructions, providing a total range of 8192 instructions.

[0075] The Dispatcher controls the passing of frames to the ten protocolprocessors and manages interrupts and timers.

[0076] The Completion Unit guarantees frame order from the processorcomplex to the switch fabric and target port queues. A rich instructionset includes conditional execution, packing (for input hash keys),conditional branching, signed and unsigned operations, counts of leadingzeros and more.

[0077] The Classifier Hardware Assist engine passes each frame's layer 2and layer 3 protocol header and provides this information with frames asthey are dispatched to the protocol processors.

[0078] The Control memory Arbiter controls processor access to bothinternal and external memory.

[0079] External Control memory options include 5 to 7 DDR DRAMsubsystems each supporting a pair of 2M×16 bit×4 bank or a pair of 4M×16bit×4 bank DDR DRAMs. The DDR DRAM interface runs at a 133 MHZ clockrate and a 266 MHZ data strobe supporting configurable CAS latency anddrive strength. An optional 133 MHZ ZBT SRAM can be added in either a128K×36, 2×256K×18 or 2×512K×18 configuration.

[0080] Egress frames may be stored in either one External Data Buffer(e.g. DSO) or two External Data Buffers (DS0 and DS1). Each Buffer canbe comprised of a pair of 2M×16 bit×4 bank DDR DRAM (storing up to 256K64-byte frames) or a pair of 4M×16 bit×4 bank DDR DRAM (storing up to512K 64-byte frames). Choose the single External Data Buffer (e.g. DS0)for 2.28 Mbps or add the second Buffer (e.g. DS1) to support 4.57 Mbpslayer 2 and layer 3 switching. Adding the second Buffer improvesperformance, but it does not increase frame capacity. The External DataBuffer interface runs at a 133 MHZ clock rate with a 266 MHZ data strobeand supports configurable CAS latency and drive strength.

[0081] Internal Control memory includes two 512×128 bit RAMs, two1024×36 bit RAMs and one 1024×64 bit RAM.

[0082] Internal Data storage provides buffering for up to 2048 64-byteframes in the Ingress direction (UP).

[0083] Fixed Frame alterations include VLAN tag insertions in theIngress direction and VLAN tag deletions, Time To Liveincrement/decrement (IP, IPx), Ethernet CRC overlay/insert and MAC DA/SAoverlay/insert in the Egress (DOWN) direction.

[0084] Port mirroring allows one receive port and one transmit port tobe copied to a system designated observation port without using protocolprocessor resources. Mirrored Interface device ports are configured toadd frame and switch control data. A separate data path allows directframe enqueuing to the Ingress Switch interface.

[0085] The interface device integrates four Ethernet macros. Each macrocan be individually configured to operate in either 1 Gigabit or 10/100Fast Ethernet modes. Each Ethernet macro supports the following:

[0086] Up to ten 10/100 Mbps MACs or one 1000 Mbps MACs for each of fourmacros.

[0087]FIG. 1A shows a block diagram of the MAC core. Each macro includesthree Ethernet Core designs; to wit, the multiport 10/100 Mbps MAC Core(Fenet), the 1000 Mbps MAC core (Genet) and the 100 Mbps Physical CodingSublayer Core (PCS). Multi-Port Ethernet 10/100 MAC Features:

[0088] Supports ten Serial Medium Independent Interfaces to the physicallayer

[0089] Capable of handling ten ports of 10 Mbps or 100 Mbps mediaspeeds, any speed mix

[0090] A single MAC services all ten ports with a Time DivisionMultiplex interface

[0091] Supports Full/Half duplex operations at media speed on all ports

[0092] Supports IEEE 802.3 Binary Exponential Backoff

[0093] 1000 Mbps Ethernet MAC Core Features:

[0094] Supports Gigabit Medium Independent Interface (GMII) to thephysical PCS layer or directly to the physical layer

[0095] With the PCS Core, supports a complete TBI (8b/10b) solution

[0096] Supports Full duplex Point-to-Point connections at media speed

[0097] Supports the IBM PCS Core valid byte signalling

[0098] 1000 Mbps Ethernet Physical Coding Sublayer Core Features:

[0099] Performs 8b/10b encoding and decoding

[0100] Supports the PMA (10 bit) Service Interface as defined in IEEE802.3z, this interface attaches to any PMA that is compliant with IEEE802.3z

[0101] Synchronizes data received from the PMA (two phase clock) withthe MAC (single phase) clock

[0102] Supports Auto-Negotiation including two next pages

[0103] Converts from a two phase clock system defined in the standardsto a single phase clock

[0104] Provides a signal to the MAC indicating those clock cycles thatcontain new data

[0105] Checks the received code groups (10 bits) for COMMA's andestablishes word sync

[0106] Calculates and checks the 8b/10b running disparity

[0107] FIGS. 2A-2D show different configurations for the Interfacedevice Chip. The configurations are facilitated by DASL and connectionto a switching fabric device. Each DASL includes two channels; namely, atransmit channel and a receiver channel.

[0108]FIG. 2A shows a wrap configuration for a single Interface device.In this configuration, the transmit channel is wrapped to the receivechannel.

[0109]FIG. 2B shows the configuration in which two Interface deviceChips are connected. Each Interface device Chips is provided with atleast two DASLs. In this configuration, the channels on one DASL on onechip are operatively connected to the channels of a matching DASL on theother chip. The other DASL on the chip is wrapped.

[0110]FIG. 2C shows the configuration in which multiple Interfacedevices are connected to a switch fabric. The double headed arrowsindicate transmission in both direction.

[0111]FIG. 2D shows the configuration in which a Main switch and aBackup switch are connected to Multiple Interface devices. If the mainswitch goes down, the backup is available for use.

[0112] A Control Point (CP) includes a System Processor that isconnected to each of the configuration. The system processor at the CP,among other things, provides initialization and configuration servicesto the chip. The CP may be located in any of three locations: in theinterface device chip; on the blade on which the chip is mounted orexternal to the blade. If external to the blade, the CP may be remote;that is, housed elsewhere and communicating by the network to which theinterface device and CP are attached. The elements of a CP are shown inFIG. 17 and include memory elements (cache, flash and SDRAM), a memorycontroller, a PCI bus, and connectors for a backplane and for L1 networkmedia.

[0113]FIG. 18 shows the single chip Network Processor and the functionsprovided by the EDS-UP, the traffic Management (MGT) Schedules and theEDS-DOWN (DN). The U-shaped icon separates queues and the ControlBlocker (CB) that keeps track of the contents in the queues arerepresented by rectangular icons.

[0114] A description of the elements, their respective functions andinteraction follows.

[0115] PMM: This is the part of the Network Processors that contains theMACs (FEnet, POS, GEnet) and attaches to the external PHY devices.

[0116] UP-PMM: This logic takes bytes from the PHYs, and formats it intoFISH (16 bytes) to pass on to the UP-EDS. There are 4 DMUs within thePMM, each capable of working with 1 GEnet or 10 FEnet devices.

[0117] UP-EDS: This logic takes the fish from UP-PMM and stores theminto the UP-Data Store (internal RAM). It is capable of working on 40frames at once, and after the appropriate number of bytes are received,it will enqueue the frame to the EPC. When the EPC is finished with theframe, the UP-EDS will enqueue the frame into the appropriate TargetPort Queue and start sending the frame to the UP-SDM. The UP-EDS isresponsible for all buffer and frame management and returns thebuffers/frames back to free pools when the transfer to UP-SDM iscomplete.

[0118] EPC: This logic contains the picoprocessors and (could) containthe embedded PowerPC. This logic is capable of looking at the frameheader and deciding what to do with the frame (forward, modify, filter,etc.). The EPC has access to several lookup tables, and hardware assiststo allow the picoprocessors to keep up with the high-bandwidthrequirements of the Network Processor.

[0119] UP-SDM: This logic takes the frames, and formats them into PRIZMAcells for transmission to the switch fabric. This logic is also capableof inserting the VLAN header into the frame.

[0120] UP-SIF: This logic contains the UP-DASL macros and attaches tothe external switch I/Os.

[0121] DN-SIF: This logic contains the DN-DASL macros and receivesPRIZMA cells from the external I/Os.

[0122] DN-SDM: This logic receives the PRIZMA cells and preprocessesthem for help in frame reassembly.

[0123] DN-EDS: This logic takes each cell and assembles them back intoframes. The cell is stored into external Data Store, and buffers arelinked together to make frames. When the entire frame is received, theframe will be enqueued to the EPC. After EPC is finished with the frame,it is enqueued to the Scheduler (if present) or the Target Port Queues.DN-EDS then sends the frames to the appropriate port by sending theframe, any alteration information, and some control information to theDN-PMM.

[0124] DN-PMM: Takes the information from DN-EDS and formats the frameinto Ethernet, POS, etc. and sends the frame to the external PHY.

[0125] SPM: This logic is used to allow the Network Processor tointerface to external devices (PHYs, LEDs, FLASH, etc) but only requires3 I/Os. The Network Processor uses a serial interface to communicate toSPM and then SPM preforms the necessary functions to manage theseexternal devices.

[0126] UP-SIDE Flow

[0127] 1) Frame arrive at PHY

[0128] 2) Bytes are received by UP-PMM

[0129] 3) UP-PMM sends FISH over to UP-EDS (Fish means a portion of aframe)

[0130] 4) UP-EDS stores FISH into UP-DS

[0131] 5) UP-EDS sends header over to EPC

[0132] 6) EPC processes header and sends enqueue information back toUP-EDS

[0133] 7) UP-EDS continues to receive the remainder of frame from UP-PMM

[0134] 8) UP-EDS sends information to UP-SDM when appropriate data isready to send to switch

[0135] 9) UP-SDM reads frame data and formats it into PRIZMA cells

[0136] 10) UP-SDM sends cells to UP-SIF

[0137] 11) UP-SIF transfers the cells over the DASL serial links toPRIZMA

[0138] 12) UP-EDS frees up buffers/frames when all the data has beentaken

[0139] DN-SIDE Flow

[0140] 1) DN-SIF receives PRIZMA cells

[0141] 2) DN-SDM stores cells and preprocesses them for reassemblyinformation

[0142] 3) DN-EDS receives the cell data and reassembly information andlinks the cell into a new frame on down side

[0143] 4) DN-EDS stores the cell into DN-DS

[0144] 5) DN-EDS enqueues the frame to EPC when all of the data havebeen received

[0145] 6) EPC processes the header and sends enqueue information back toDN-EDS

[0146] 7) DN-EDS enqueues the frame into a scheduler queue (if present)or a Target Port Queue

[0147] 8) DN-EDS services the queues and sends frame information intothe PCB

[0148] 9) DN-EDS uses the PCB to “unravel” the frame and reads theappropriate data and sends that data to DN-PMM

[0149] 10) DN-PMM formats the data (with alteration if requested) andsends the frame to the external PHY

[0150] 11) DN-PMM informs DN-EDS when buffers are no longer needed andDN-EDS frees theses resources

[0151] FRAME Control Flow

[0152] 1) Header is send to EPC from UP-DS or DN-DS

[0153] 2) EPC looks up header information in lookup tables and receivesframe enqueue information

[0154] 3) EPC sends the enqueue information back to the EDS and theframe is enqueued to the appropriate queue

[0155] 4) Cell Headers and Frame Headers are sent along with the framedata to aid in reassembly and frame forwarding

[0156] CP Control Flow

[0157] 1) Control Point formats a Guided Frame and sends it to theNetwork Processor

[0158] 2) The Network Processor enqueues the Guided Frame to the GCHpicoprocessor

[0159] 3) The GCH processes the Guided Frame and reads or writes therequested areas of Rainier

[0160] 4) The GCH passed any Table update requests over to the GTH

[0161] 5) The GTH updates the appropriate table with information fromGuided Frame

[0162] 6) An acknowledgement Guided Frame is sent back to CP

[0163] Network Processor Control Flow

[0164] 1) A Picoprocessor can build a Guided Frame to send informationto another Rainier or the Control Point

[0165] 2) The Guided Frame is sent to the appropriate location forprocessing

[0166] A single Interface device provides media speed switching for upto 40 Fast Ethernet Ports (FIG. 2A). 80 Fast Ethernet Ports aresupported when two Interface devices are interconnected using IBM's DataAligned Synchronous Link (DASL) technology (FIG. 2B). Each DASLdifferential pair carries 440 Mbps of data. Two sets of eight pairsprovide a 3.5 Gbps duplex connection (8 times 440 Mbps in eachdirection). As shown in FIGS. 2C and 2D, larger systems can built byinterconnecting multiple Interface devices to a switch such as IBM'sPrizma-E switch. The Interface device provides two of the 3.5 Gbpsduplex DASL connections, one primary and one secondary, which can beused to provide a wrap-backpath for local frame traffic (when twoInterface devices are directly connected, FIG. 2B) or a connection to aredundant switch fabric (FIG. 2D, Backup Sw.). In view of the above, thesingle Network Processor Chip is scaleable in that one chip can be usedto provide a low end system (having relatively low port density—say 40)to high end system (having relatively high port density, say 80−nports).

[0167] One Interface device in the system is connected to the systemprocessor via up to ten 10/100 Mbps Fast Ethernet ports or a single 1000Mbps Ethernet port. The Ethernet configuration to the system processoris placed in an EEPROM attached to the Interface device and loadedduring initialization. The system processor communicates with allInterface devices in a system (see FIG. 2) by building special GuidedFrames encapsulated as ethernet frames. The encapsulated Guided Framesare forwarded across the DASL link to other devices allowing ail of theInterface devices in the system to be controlled from a single point.

[0168] Guided Frames are used to communicate control information betweenthe Control Point (CP) and the Embedded Processor Complex and within theinterface device. A prior disclosure of Guided Cells which willelucidate the discussion here is found in U.S. Pat. No. 5,724,348 issuedMar. 3, 1998 for Efficient Hardware/Software Interface for a DataSwitch” mentioned hereinabove and incorporated hereinto by reference.

[0169] For Guided Frame traffic that originates at the CP, the CPconstructs the Guided Frame in data buffers in its local memory. TheCP's Device Driver sends the Guided Frame to one of the media interfacesof the NetworkProcessor. Media Access Control (MAC) hardware recoversthe Guided Frame and stores it in its internal data store (U-DS) memory.The Guided Frame is routed to the appropriate blade, processed, androuted back to the CP as required. Guided Frames passing between anexternal CP and the interface device are encapsulated to adapt to theprotocol of the external network. As a consequence, if the externalnetwork includes ethernet, the Guided Frames are encapsulated asethernet frames and so forth.

[0170] Ethernet encapsulation provides a means of transport for GuidedTraffic between the CP and the Interface device. The Ethernet MAC (EnetMAC) of the Interface device does not analyze the Destination Address(DA) or Source Address (SA) when receiving frames. This analysis isperformed by the EPC picocode. Guided Traffic presumes that theInterface device has not been configured and the DA and SA cannot beanalysed by the EPC picocode. Therefore, these frames are inherentlyself-routing. The Enet MAC does, however, analyse the Ethernet Typefield to distinguish Guided Traffic from Data Traffic. The value of thisEthernet Type value of the Guided Frame must match the value loaded intothe E_Type_C Register. This register is loaded from Flash Memory by theInterface device's boot picocode.

[0171] The CP constructs the Guided Frame in data buffers in its localmemory. The contents of a 32 bit register in the CP processor are storedin big endian format in the local memory as shown in FIG. 3. Havingconstructed the Guided Frame, the CP'S Device Driver sends an Ethernetframe containing a DA for specific Guided Cell Handler (GCH), an SAcorresponding to the global MAC address for the CP or the MAC addressfor specific interface, a special Ethernet Type field that indicates aGuided Frame, and the Guided Frame Data. All Ethernet frames arriving onthe port are received and analyzed by Enet MAC. For frames with anEthernet Type value matching the contents of the E_Type_C Register, theEnet MAC strips off the DA, SA and Ethernet Type fields and stores theGuided Frame data into the U_DS memory. Bytes are collected by the EnetMAC one at a time into a block of 16 bytes called a Fish. These bytesare stored in big endian format with the first byte of the Guided Framestored in the most significant byte location of the Fish (Byte 0).Succeeding bytes are stored in successive byte locations within the Fish(Byte 1, Byte 2, . . . , Byte 15). These 16 bytes are then stored in aBuffer in the U_DS beginning at the Fish 0 location. Succeeding Fishesare stored in successive Fish locations within the Buffer (Fish 1, Fish2, Fish 3, etc.). Additional Buffers are obtained from a free pool asrequired to store the remainder of the Guided Frame.

[0172] The flow of guided traffic within the interface device 10 isshown in FIG. 4. The Enet MAC function of the Interface device examinesthe frame header information and determines that the frame is a GuidedFrame. The Enet MAC removes the frame header from the Guided Frame andbuffers the remainder of its contents in Interface device's internalU_DS memory. The Enet MAC indicates that the frame is to be enqueued tothe General Control (GC) Queue for processing by the GCH. When the endof the Guided Frame has been reached, the Enqueue, Dequeue, and Schedule(EDS) logic enqueues the frame into the GC Queue.

[0173] The GCH picocode on the blade locally attached to the CP examinesthe Frame Control Information (see FIG. 6) to determine whether theGuided Frame is intended for other blades in the system and whether theGuided Frame is to be executed on the down side of the Interface device.If the frame is intended for blades other than or in addition to thelocally attached blade, the GCH picocode updates the TB value in theFrame Control Block (FCB) with the TB value from the Guided Frame'sFrame Control information and instructs the EDS to enqueue the frame inthe multicast Target Blade Start of Frame (TB_SOF) Queue. Forperformance reasons, all Guided Traffic is enqueued to the multicastTB_SOF queue independent of the number of destination blades indicated.

[0174] If the frame is intended for only the locally attached blade, theGCH picocode examines the up/down field of the Frame Control informationto determine whether the Guided Frame is to be executed on the up ordown side of the Interface device (see FIG. 6). If the Guided Frame isto be executed on the down side of the Interface device, the GCHpicocode updates the TB value in the FCB with the TB value from theGuided Frame's Frame Control information and instructs the EDS toenqueue the frame in the multicast Target Blade Start of Frame (TB_SOF)Queue. If the Frame Control information indicates that the Guided Frameis to be executed on the up side, the GCH picocode analyzes the GuidedFrame and performs the operations indicated by the Guided Commands itcontains.

[0175] Prior to processing of Guided Commands, the picocode checks thevalue of the ack/{overscore (noack)} field of the Frame Controlinformation. If this value is ‘0’b, then the Guided Frame is discardedfollowing processing. Guided read commands shall not be of thiscategory.

[0176] If the value of the ack/{overscore (noack)} field is ‘1’b, andthe value of the early/late field is ‘1’b, then prior to processing anyof the Guided Commands in the Guided Frame, the picocode constructs anEarly Ack Guided Frame with the value of the TB field of the FrameControl equal to the contents of the Early_Ack Guided Frame with thevalue of the TB field of the Frame Control equal to the contents of theMy_TB Register. The picocode routes the Early Ack Guided Frame back tothe CP by updating the TB value in the frame's FCB with the valuecontained in the TB field of the LAN Control Point Address (LAN_CP_Addr)Register and instructing the EDS to enqueue the frame in the multicastTB_SOF Queue. The picocode then processes the Guided Commands of theGuided Frame and discards the Guided Frame. Guided read commands shallnot be of this category.

[0177] If, on the other hand, the value of the ack/{overscore (noack)}field is ‘1’b and the value of the early/late field is ‘0’b, thepicocode changes the resp/{overscore (req)} field of the Frame Controlinformation to ‘1’b to indicate a Guided Frame response, replaces the TBfield with the contents of the My_TB Register, and processes each GuidedCommand within the Guided Frame. During the course of processing aGuided Command, the picocode updates the Completion Code field of thenext Guided Command with the completion status code value for thecurrent Guided Command. The picocode routes the response back to thesource by updating the TB value in the (FCB) with the valuecorresponding to the Source Blade (LAN_CP_Addr Register value for CP)and instructing the EDS to enqueue the frame in the multicast TB_SOFQueue.

[0178] Frames residing in the TB_SOF Queue are scheduled for forwardingby the EDS. The Switch Data Mover (SDM) builds the switching fabric CellHeader and Interface device Frame Header from the information containedin the FCB. These cells pass through the switching fabric device andarrive at the target blade where the cells are reassembled into a framein the D-DS memory. The SDM of the down side recognizes that the frameis a Guided Frame and signals the EDS to enqueue it in the GC Queue.

[0179] Pressure from the GC Queue or the GT Queue stimulates thepicocode to access and analyse the Guided Frames. All Guided Framesarriving on the down side are initially enqueued in the GC Queue. Thegch/{overscore (gth)} value of the Frame Control Information for theseframes is examined by GCH picocode. If the gch/{overscore (gth)} valueis ‘0’b, the Guided Frame is enqueued in the GT Queue. Otherwise, theGCH picocode examines the resp/{overscore (req)} field of the FrameControl information to determine if the Guided Frame has already beenexecuted. If the resp/{overscore (req)} has a value of ‘1’b, then theGuided Frame has already been executed and is routed to the CP. Targetport values corresponding to CP connections are maintained by EPCpicocode. Frames from these Target Port queues are transmitted from theInterface device back to the CP.

[0180] If the resp/{overscore (req)} field has a value of ‘0’b, then theblade may be local or remote with respect to the CP. This is resolved bycomparing the value of the TB field of the LAN_CP_Addr Register with thecontents of the My Target Blade (My_TB) Register. If they match, thenthe blade is local to the CP, otherwise, the blade is remote form theCP. In either case, the picocode examines the up/down value of the FrameControl Information. If up/{overscore (down)} is equal to ‘1’b, then theframe is enqueued in the Wrap TP queue for forwarding to the U_DS andprocessing by the GCH on the up side. Otherwise, the picocode (GCH orGth) performs the operations indicted by the Guided Commands containedin the Guided Frame. Prior to processing of the Guided Commands, thepicocode checks the value of the ack/{overscore (noack)} field of theFrame Control information. If this value is ‘0’b, then the Guided Frameis discarded following processing. Guided read commands shall not be ofthis category.

[0181] If the value of the ack/{overscore (noack)} field is ‘1’b and thevalue of the early/{overscore (late)} field is ‘1’b, then prior toprocessing any of the Guided Commands in the Guided Frame, the picocodeconstructs an Early Ack Guided Frame with the value of the TB field ofthe Frame Control information equal to the contents of the My_TBRegister. If the bade is remote from the CP, the picocode routes theEarly Ack Guided Frame to the Wrap Port. Otherwise, the blade is localto the CP and the frame is routed to the Port Queue corresponding to theCP. The picocode processes the Guided Commands while either the WrapPort moves the Early Ack Guided Frame from the D_DS to the U_DS andenqueues the frame in the GC Queue on the up side or the frame istransmitted from the Port Queue back to the CP. For frames wrapped backto the U_DS, the GCH picocode again sees this frame, but the resp/reqfield will have a value of ‘1’b. The GCH picocode routes the frame backto the CP by updating the TB field in the FCB with the value containedin the TB field of the LAN_CP_Addr Register and instructing the EDS toenqueue the frame in the multicast TB_SOF Queue. Frames residing in theTB_SOF Queue are scheduled for forwarding by the EDS. The SDM builds thePrizma Cell Header and Interface device Frame header from informationcontained in the FCB. Cells from this frame pass through Prizma and arereassembled into a frame on the CP's local blade. The SDM of the downside recognizes that the frame is a Guided Frame and signals the EDS toenqueue it in the GC Queue. This time when the GCH picocode analyzes theframe, the resp/req field has a value of ‘1’b. This implies that thisblade is locally attached to the CP and the Guided Frame is routed tothe Port Queue corresponding to the CP. Frames from this queue aretransmitted from Interface device back to the CP.

[0182] If, on the other hand, the value of the ack/{overscore (noack)}field is ‘1’b and the value of the early/late field is ‘0’b, thepicocode changes the resp/{overscore (req)} field to ‘1’b to indicate aGuided Frame response, replaces the TB field with the contents of theMy_TB Register, and then processes each Guided Command within the GuidedFrame. During the course of processing a Guided Command, the picocodeupdates the Completion Code field of the next Guided Command with thecompletion status code value for the current Guided Command. If theblade is remote from the CP, then the picocode routes the Guided Frameto the Wrap Port. Otherwise, the blade is local to the CP and the frameis routed to the Port Queue corresponding to the CP. Either the WrapPort moves the Guided Frame from the D_DS to the U_DS and enqueues theframe in the GC Queue on the up side or the frame is transmitted formthe Port Queue back to the CP. For frames wrapped back to the U_DS, theGCH picocode again sees this frame, but the resp/{overscore (req)} fieldwill have a value of ‘1’b. The GCH picocode routes the frame back to theCP by updating the TB field in the FCB with the value contained in theTB field of the LAN_CP_Addr Register and instructing the EDS to enqueuethe frame in the multicast TB_SOF Queue. Frames residing in the TB_SOFQueue are scheduled for forwarding by the EDS. The SDM builds the PrizmaCell Header and Interface device Frame header from information containedin the FCB. Cells from this frame pass through Prizma and arereassembled into a frame on the down side of the CP's local blade. TheSDM of the down side recognizes that the frame is a Guided Frame andsignals the EDS to enqueue it in the GC Queue. This time when the GCHpicocode analyzes the frame from the D_DS, the resp/{overscore (req)}field has a value of ‘1’b. This implies that this blade is locallyattached to the CP and the Guided Frame is routed to the Port Queuecorresponding to the CP. Frames from this queue are transmitted fromInterface device back to the CP.

[0183] If, for any reason, the GCH picocode encounters a Guided Framewith the TB field of the Frame Control information equal to ‘0000’h,then the GCH picocode interprets the frame as intended for only thisblade and act accordingly. This action is required during initializationwhen the value of the My_TB Register is ‘0000’h for all blades. The CPwill initialize the My_TB Register of the locally attached blade bysending Write Guided Command in a Guided Frame whose Frame ControlInformation has a TB value of ‘0000’h.

[0184] Any of the picoprocessors within the EPC can generate a GuidedFrame. This frame can be the Unsolicited Guided Frame or any other formof Guided Frame. Internally generated frames of this type areconstructed in a way that does not allow acknowledgment (i.e.ack/{overscore (noack)}=‘0’b). These frames may be sent to one of thetwo picoprocessors (GCH or GTH) within the same EPC or to the GCH or GTHof some other blade.

[0185] Unsolicited Guided Frames may also be sent to the CP. GuidedFrames destined for the same EPC are constructed using data buffers inthe D_DS. These frames are then enqueued in the GC or GT Queue forprocessing. These frames are then processed and discarded in the usualmanner. Unsolicited Guided Frames destined for the locally attached CPare constructed using data buffers in the D_DS. These frames areconstructed in a way that indicates that they have been executed by theEPC (i.e. resp/{overscore (req)}=‘1’b, and TB=My_TB). These frames areenqueued in the Port Queue corresponding to the CP. Frames from thisqueue are transmitted back to the CP.

[0186] Guided Frames destined for another blade can be constructed usingdata buffers in the D_DS or the U_DS. Unsolicited Guided Frames destinedfor the CP are constructed in a way that indicates that they have beenexecuted by the EPC (i.e. resp/{overscore (req)}=‘1’b, and TB=My_TB).Frames constructed using buffers from the D_DS are enqueued to the WrapPort. These frames are moved to the U_DS and enqueued to the GC Queue onthe up side. Unsolicited Guided Frames with a resp/req value of ‘1’bwill be routed to the CP using TB value in the LAN_CP_Addr Register.Otherwise, the GCH picocode routes these frames using the TB value ofthe Frame Control Information of the Guided Frame. At the receivingblade, the frame is enqueued to the GC Queue of the down side. The GCHof this blade executes and discard the frame (resp/{overscore(req)}=‘0’b and gch/{overscore (gth)}=‘1’), or enqueues the frame to theGT Queue (resp/req=‘0’b and gch/gth=‘0’), or enqueues the frame to thePort Queue corresponding to the CP (resp/{overscore (req)}=‘1’b). Framesconstructed using data buffers in the U_DS are enqueued directly intothe GC Queue of the up side. From this point forward, these framesfollow the same route and are handled in the same way as thoseconstructed using D_DS data Buffers. FIG. 5 shows the generalized formatfor guided frames.

[0187] The format shown is a logical representation with the mostsignificant byte on the left and the least significant byte on theright. Four byte words begin with word 0 at the top and increase towardsthe bottom of the page.

[0188] Since Guided Frames must be routed and processed before theinterface device has been configured by the CP, these frames must beself-routing. The results normally obtained by look-up andclassification are contained in this Frame Control information field ofthe Guided Frame allowing the chip to update the FCB with thisinformation without performing a look-up operation. The target bladeinformation contained in the Guided Frame is used by the Guided FrameHandler to prepare the Leaf Page field of the FCB. The CP provides theTarget Blade information while the GCH picocode fills in the otherfields in the FCB. This FCB information is used by the SDM to preparethe Cell and Frame headers. The format of the Frame Control informationfield of the Guided Frame is shown in FIG. 6.

[0189] An explanation for the abbreviation at each bit position in FIG.6 follows:

[0190] resp/{overscore (req)} Response and Not Request indicator value.This field is used to differentiate between request (unprocessed) andresponse Guided Frames.

[0191] 0 request

[0192] 1 response

[0193] ack/{overscore (noack)} Acknowledgment or No Acknowledgmentcontrol value. This field is use to control whether (ack) or not (noack)the GCH picocode acknowledges the Guided Frame. Guided Frames that arenot to be acknowledged shall not contain any form of Guided Command thatperforms a read.

[0194] 0 No Acknowledgment

[0195] 1 Acknowledgment

[0196] early/{overscore (late)} Early and Late Acknowledgment controlvalue. This field is used to control whether the acknowledgmentrequested (ack/{overscore (noack)}=‘1’b) occurs before (early) or after(late) the Guided Frame has been processed. This field is ignored whenack/{overscore (noack)}=‘0’b.

[0197] 0 Acknowledge after Guided Frame processing

[0198] 1 Acknowledge before Guided Frame processing

[0199] neg/{overscore (all)} Negative Acknowledgment or Acknowledge Allcontrol value. This field is ignored when the ack/noack field has avalue of ‘0’b unless a guided command does not complete successfully.

[0200] 0Acknowledge all Guided Frames if ack/noack=‘1’b. Early or LateAcknowledgment determined by value of early/late.

[0201] 1 Acknowledge only Guided Frames that do not completesuccessfully. This acknowledgment will occur independent of the valuesof ack/noack and early/late and will of course be a late acknowledgment.

[0202] up/{overscore (down)} Up or Down control value. This value isused to control whether the frame is processed on the up side or thedown side. This field is ignored when resp/{overscore (req)} is ‘1’b.All multicast Guided Frames shall have an up/{overscore (down)} value of‘0’b. In addition, Guided Commands that require the use of GTH hardwareassist instructions shall have an up/down value of ‘0’b.

[0203] 0 Down side processing

[0204] 1 Up side processing

[0205] gth/{overscore (gch)} General Tree Handler or Guided Cell Handlercontrol value. This value is used to direct Guided Frames to the properpicoprocessor.

[0206] 0 GCH picoprocessor

[0207] 1 GTH picoprocessor

[0208] TB Target Blade value. When resp/{overscore (req)} is ‘0’b, thisfield contains routing information used by Prizma. Each bit positioncorresponds to a Target Blade. If this value is ‘0000’h, then the GuidedFrame is assumed to be for this blade and is executed accordingly. Avalue of ‘1’b in one or more bit positions of the TB field indicatesthat the cell is routed to the corresponding Target Blade(s). Whenresp/{overscore (req)} is ‘1’b, the field contains the My_TB value ofthe responding blade.

[0209] Word 1 of the Guided Frame contains a correlator value (FIG. 7).This value is assigned by the CP software to correlate Guided Frameresponses with their requests. The Correlator includes a plurality ofbits with assigned functions.

[0210] Every Guided Command begins with a Command Control Informationfield. This Command Control contains information that aids the GCHpicocode in processing a Guided Frame. The format for this informationis shown in FIG. 8.

[0211] Length value: This value indicates the total number of 32 bitwords contained in the Control Information (Cmd Word 0), The AddressInformation (Cmd Word 1), and Operand (Cmd Words 2+) portions of theGuided Frame.

[0212] Completion Code value: This field is initialized by the CP and ismodified by the GCH picocode when processing Guided Commands. The GCHpicocode uses this field for completion status for the preceding GuidedCommand in the command list. Since all Guided Command lists terminatewith the End Delimiter Guided Command, the completion status of the lastcommand is contained in the End Delimiter's Completion Code field.

[0213] Guided Command Type Value (Symbolic Name) Type Symbolic NameValue Type Description End_Delimiter 0000 mark the end of a Guided Framesequence Build_TSE_Free_List 0001 build a free list. Software_Action0010 execute software action Unsolicited 0011 frames initiated by theEPC picocode Block_Write 0100 write a block of data to consecutiveaddresses Duplicate_Write 0101 write duplicate data to registers ormemory. Read 0110 request and respond for reading register or memorydata 0111 reserved Insert_Leaf 1000 insert a leaf into the search tree.Update_Leaf 1001 update a leaf of the search tree Read_Leaf 1010 requestand respond for reading of Leaf Page data 1011 reserved Delete_Leaf 1100delete a leaf of the search tree 1101-1111 reserved

[0214] The addressing information contained in the Guided Frameidentifies an element within the Networking Processor's addressingscheme. The general form for the Address Information field is shown inFIG. 9.

[0215] The Interface device employs a 32 bit addressing scheme. Thisaddressing scheme assigns an address value to every accessible structureof the Interface device. These structures are either internal to theProcessor or connected to interfaces under the control of the Processor.Some of these structures are accessed by the Embedded Processor Complex(EPC) via an internal interface called the Web Interface. The remainderof the structures are accessed via memory controller interfaces. In allcases the general form of the address is shown in FIG. 10.

[0216] The Network Controller is subdivided into major chip islands.Each island is given a unique Island ID value. This 5 bit Island IDvalue forms the 5 most significant bits of the address for structurescontrolled by that chip island. The correspondence between encodedIsland ID value and the chip island name is shown in FIG. 11. The secondportion of the Web address consists of the next most significant 23bits. This address field is segmented into a structure address portionand an element address portion. The number of bits used for each segmentmay vary from island to island. Some islands may contain only a fewlarge structures while others may contain many small structures. Forthat reason there is no fixed size for these address segments. Thestructure address portion is used to address an array within the islandwhile the element address portion is used to address an element withinthe array. The remaining portion of the address is to accommodate theWeb Interface's 32 bit data bus limitation. This 4 bit word address isused for selecting 32 bit segments of the addressed element. This isnecessary for moving structure elements wider than 32 bits across theNetwork Controller's Web Data Bus. Word address value ‘0’h refers to the32 most significant bits of the structure element while sequential wordaddress values correspond to successively less significant segments ofthe structure element. The word address portion of the address is notrequired for structures not accessed via the Web Interface. For thisreason, the Up Data Store, Control Memories, and Down Data Store makeuse of the entire 27 least significant bits of address to accessstructure elements. Another exception to this format is the address forthe SPM Interface. In that case all 27 bits of address are used and noelement is greater than 32 bits in width.

[0217] The Embedded Processing Complex (EPC) provides and controls theprogrammability of the Interface device Chip. It includes the followingcomponents (see also FIG. 12A):

[0218] N processing units, called GxH: The GxHs concurrently executepicocode that is stored in a common Instruction Memory. Each GxH consistof a Processing Unit core, called CLP, which contains a 3-stagepipeline, 16 GPRs and an ALU. Each GxH also contains severalcoprocessors, like for example the Tree Search Engine. The GxH isdisclosed separately.

[0219] Instruction Memory: Is loaded during initialization and containthe pico-code for forwarding frames and managing the system.

[0220] A Dispatcher: Dequeues frame-addresses from the up and downdispatcher queues. After dequeue, the dispatcher pre-fetches part of theframe-header from the up or down DataStore (DS) and stores this in aninternal memory. As soon as a GxH becomes idle, the Dispatcher passesthe frame header with appropriate control information, like the CodeInstruction Address (CIA) to the GxH. The dispatcher also handles timersand interrupts.

[0221] A Tree Search Memory (TSM) Arbiter: There are a number of sharedinternal and external memory locations available to each GxH. Since thismemory is shared an arbiter is used to control access to the memory. TheTSM can be accessed directly by the picocode, which can for example beused to store aging tables in the TSM. Also, the TSM will be accessed bythe TSE during tree searches.

[0222] The Completion Unit (CU): The Completion Unit performs twofunctions. First, it interfaces the N Processing Units to the UP and DnEDS (Enqueue, Dequeue and Schedule Island). The EDS performs the enqueueaction: a frame address, together with appropriate parameters called theFCBPage, is queued in either a transmission queue, a discard queue, or adispatcher queue. Second, the Completion Unit guarantees frame sequence.Since it may happen that multiple GxHs are processing frames that belongto same flow, precautions must be taken that these frames are enqueuedin the up or dn transmission queues in the right order. The CompletionUnit uses a label that is generated by the Classifier Hardware Assistupon frame dispatch.

[0223] Classifier Hardware Assist: For up-frames, the ClassifierHardware Assist provides a classification for well known cases of frameformats. Classification results are passed to the GxH, during framedispatch, in terms of the CIA and contents of one or more registers. Fordn-frames, the Classifier Hardware Assist determines the CIA, dependingon the frame header. For both up and dn frame dispatches, the ClassifierHardware Assist generates a label that is used by the Completion Unit tomaintain frame sequence.

[0224] Up and dn DataStore Interface and arbiter: Each GxH has access tothe up and dn DataStore: read access is provided when reading “moreFish” and write access is provided when writing back the contents of theFish Pool to the DataStore. Since there are N Processing Units, and onlyone of them at a time can access the up DataStore and one at a time canaccess the dn DataStore, one arbiter for each DataStore is required.

[0225] WEB Arbiter and WEBWatch interface: The WEB Arbiter arbitratesamong the GxHs for access to the WEB. All GxHs have access to the WEB,which allows access all memory and registers functions in Interfacedevice. This allows any GxH to modify or read all configuration areas.The WEB can be thought of as the Interface device memory map. TheWEBWatch interface, provides access to the entire WEB from outside thechip using 3 chip-IOs.

[0226] Debug, Interrupts and Single Step Control: The WEB allows the GCHor WEBWatch to control each GxH on the chip when necessary. For example,the WEB can be used by the GCH or WEBWatch to single step instructionson a GxH.

[0227] An embedded general purpose processor, like a PowerPC.

[0228] There are three types of GxH (FIG. 12B):

[0229] GDH (General Data Handler). There are eight GDHs. Each GDH has afull CLP with the five coprocessors (which are described in the nextsection). The GDHs are mainly used for forwarding frames.

[0230] GCH (Guided Cell Handler). The GCH has exactly the same hardwareas a GDH. However, a guided frame can only be processed by the GCH. Itis programmable on the WEB (CLP_Ena register) if the GCH is enabled toalso process dataframes (in which case it takes the role of a GDH). TheGCH has additional hardware compared to the GDH: hardware assist toperform tree inserts and deletes. The GCH is used to execute guided-cellrelated picocode, perform chip and tree management related picocode likeaging and to exchange control information with the CP and/or anotherGCH. When there is no such task to perform the GCH will execute frameforwarding related picocode, and in this case behaves exactly like aGDH.

[0231] GTH (General Processor Tree Handler). The GTH has access to thehardware mailbox that connects to the PowerPC. The GTH has additionalhardware assist to perform tree inserts, tree deletes and ropemanagement. The GTH will process dataframes when there are no frames(containing tree management commands) in the GPQ.

[0232] The number of GxHs (ten) is a “best-guess”. Performanceevaluation will determine how much GxH are really required. Thearchitecture and structure is completely scaleable towards more GxH andthe only limitation is the amount of silicon area (which should thenalso include a larger arbiter and instruction memory).

[0233] Each GxH is structured as shown in FIG. 12C. In addition to theCLP with General Purpose Registers (GPR) and Arithmetic Logic Unit(ALU), each GxH contains the following give coprocessors:

[0234] (DS) Coprocessor Interface. Interfaces to the Dispatcher and tothe sub-islands that provide read and write access to the up and dnDataStores. The DS Interface contains the so called FishPool.

[0235] The Tree Search Engine Coprocessor (TSE). The TSE performssearches in the trees, and also interfaces to the Tree Search Memory(TSM).

[0236] Enqueue Coprocessor. Interfaces the Completion Unit Interface andcontains the FCBPage. This Coprocessor contains a 256-bit register withadditional hardware assist that the picocode must use to build theFCBPage, which contain the enqueue parameters. Once the FCBPage isbuilt, the picoprocessor can execute an enqueue instruction, whichcauses this coprocessor to forward the FCBPage to the Completion Unit.

[0237] WEB InterfaceCoprocessor. This coprocessor provides an interfaceto the WEB Arbiter and allows reading and writing to/from the Interfacedevice WEB.

[0238] Checksum Coprocessor. Generates checksums on frames stored in theFishpool (described hereinafter).

[0239] The Processing Units are shared between ingress processing andegress processing. It is programmable how much bandwidth is reserved foringress processing versus egress processing. In the currentimplementation, there are two modes: 50/50 (i.e. ingress and egress getthe same bandwidth) or 66/34 (i.e. ingress gets twice as much bandwidthas egress).

[0240] Operation of the Processing Units is event-driven. That is, framearrival is treated as an event, as well as popping of a timer or aninterrupt. The dispatcher treats different events in an identicalfashion, though there is a priority (first interrupt, then timer-eventsand finally frame arrival events). When an event is handed to aProcessing Unit, appropriate information is given to the ProcessingUnit. For frame arrival events, this includes part of the frame header,and information coming from the hardware classifier. For timer andinterrupts, this includes the code entry point and other informationthat relates to the event.

[0241] When a frame arrives on the ingress side, and the number ofreceived bytes of this frame has exceeded a programmable threshold, theaddress of the frame-control-block is written in a GQ.

[0242] When a complete frame has been re-assembled on the egress side,the frame address is written in a GQ. There are four types of GQ's (andfor each type, FIG. 12B, there is an ingress version and a egressversion):

[0243] GCQ: contains frames that must be processed by the GCH.

[0244] GTQ: contains frames that must be processed by the GTH.

[0245] GPQ: contains frames that must be processed by the GPH.

[0246] GDQ: contains frames that can be processed by any GDH (or GCH/GTHwhen they are enabled to process dataframes). For the GDQ, there aremultiple priorities, whereby frames enqueued in a higher priority GDQwill be processed before frames enqueued in a lower priority queue.

[0247] Some Processing Units may be specialized. In the currentimplementation, there are four types of Processing Units (GxH) (see alsoFIG. 12B):

[0248] GDH (General Data Handler). The GDHs are mainly used forforwarding frames.

[0249] GCH (Guided Cell Handler). The GCH has exactly the same hardwareas GDH. However, a guided frame can only be processed by the GCH. It isprogrammable on the WEB (CLP_Ena register) if the GCH is enabled to alsoprocess dataframes (in which case it takes the role of a GDH).

[0250] GTH (General Tree Handler). The GTH has additional hardwarecompared to the GDH/GCH: hardware assist to perform tree inserts, treedeletes and rope management. The GTH will process dataframes when thereare no frames (containing tree management commands) in the GPQ.

[0251] GPH (General PowerPC Handler). The GPH has additional hardwarecompared to the GDH/GTH. The GPH interfaces to the embedded PowerPC bymeans of a mail-box interface.

[0252] In an actual implementation, the role of GCH, GTH and GPH can beimplemented on a single Processing Unit. For example one implementationcould have one Processing Unit for GCH and GPH. A similar comment holdsfor the GCQ, GTQ and GPQ.

[0253] The purpose of the Datastore Coprocessor is:

[0254] To interface to the Up DataStore, which contains frames that havebeen received from the media, and the Down DataStore, which containsreassembled frames received from the Prizma Atlantic.

[0255] The Datastore Coprocessor also receives configuration informationduring the dispatch of a timer event or interrupt.

[0256] The Datastore Coprocessor is able to calculate checksums onframes.

[0257] The Datastore Coprocessor contains a FishPool (that can hold 8fish), a ScratchMem (that can hold 8 fish) and some control registers toread/write FishPool contents from/to the up or down datastore. TheFishPool can be seen as some kind of work area for the Datastore:instead of reading/writing directly to a Datastore, a larger amount offrame data is read from the Datastore into the Fishpool or a largeramount of data is written from the Fishpool into the Datastore. The unitof transfer is a Fish, which equals 16 Bytes.

[0258] The Fishpool can be seen as a memory that can contain 8 fish,that is 8 words of 128 bit each. In the CLP processor architecture, theFishpool is a register array of 128 bytes. Each byte in the Fishpool hasa 7-bit byte address (0. 127) and access is on a 16-bit or 32-bit basis.Like all register arrays, the Fishpool has a circular addressing scheme.That is, addressing a word (i.e. four bytes) starting at location 126 inthe Fishpool returns bytes 126, 127, 0 and 1. Furthermore, from aDatastore Coprocessor point of view, fish-locations in the Fishpool havea 3-bit fish-address.

[0259] Upon frame dispatch the first N fish of a frame are automaticallycopied in the Fishpool by the Dispatcher. The value of N is programmablein the PortConfigMemory. Typically, N equals four for up frame dispatch,2 for dn unicast frame dispatch, 4 for dn multicast frame dispatch and 0for interrupts and timers.

[0260] The picocode can read more bytes from a frame, in which case theDatastore Coprocessor automatically reads the frame data into thefish-pool at the next fish address, wrapping automatically to 0 when theboundary of the Fishpool has been reached. Also, the picocode can reador write the up/down datastore at an absolute address.

[0261] The WEB Coprocessor interfaces to the EPC WEB Arbiter. The EPCWEB Arbiter Arbitrates among the ten GxH and the WEB Watch to become amaster on the Interface device WEB interface. This allows all GxH toread and write on the WEB.

[0262] The interface device memory complex provides storage facilitiesfor the Embedded Processing Complex (EPC) FIG. 12A. The memory complexincludes the Tree-Search Memory (TSM) Arbiter and a plurality of on-chipand off-chip memories. The memories store tree structures, counters andanything else that the pico code requires memory access for.Furthermore, the memories are used to store data structures that areused by the hardware, like free lists, queue-control-blocks, etc. Anymemory location which is not allocated for trees or which is notallocated for trees or which is not used by the hardware is by defaultavailable for pico code use, like counters and aging tables.

[0263]FIG. 13 shows a more detailed block diagram of the memory complex.The tree-search memory (TSM) arbiter provides the communication linkbetween the Embedded Processors (GxH) and the memories. The memoriesinclude 5 on-chip SRAMs, 1 off-chip SRAM, and 7 off-chip DRAMS. The TSMArbiter includes ten Request Control Units (each one connected to one ofthe Embedded Processor GxH) and 13 memory arbiter units, one for eachmemory. A bus structure interconnects the Request Control Units and thearbiter units in such a way that each control unit and its connected GxHhave access to all memories.

[0264] The control unit includes necessary hardware to steer databetween the Embedded Processor (GxH) and the arbiters.

[0265] The SRAM arbiter units, among other things, manage the flow ofdata between the Embedded Processor GxH and the on-chip and off-chipSRAMs.

[0266] The DRAM Arbiter Units, among other things, manages the flow ofdata between the Embedded Processor (GxH) and the off-chip DRAM devices.

[0267] Each Memory Arbiter contains a “back-door” access, which istypically used by other parts of the chip and has highest accesspriority.

[0268] The DRAM Memories can run in two modes of operation:

[0269] TDM-mode. Memory access to the four banks in the DDRAM is donealternating read-“windows” and write-windows, whereby in a read window,access to any of the four banks is read-only and in a write window,access to any of the four banks is write only. Using TDM-mode formultiple DDRAMs allows to share some control signals between the DDRAMsand hence this saves some chip IOs (which is a very scarce resource).

[0270] Non-TDM-mode. Memory access to the four banks in the DDRAM can bea combination of read and write (which must follow some rules describedin the DDRAM Arbiter Disclosure). E.g., one can do a read in bank A anda write in bank C within an access window.

[0271] Allows N Requesters simultaneous access to M memories. Whenmultiple Requesters want to access the same memory, a round-robinarbitration is performed.

[0272] The M memories can have different properties. In our currentimplementation, there are three memory types: internal SRAM, externalSRAM and external DDRAM.

[0273] The M memories and N Requesters are homogeneous: any Requestercan access any memory.

[0274] Some memories are logically divided into multiple sub-memories(like four banks in the DDRAM), which can be logically accessedsimultaneously.

[0275] Part of the M memories are used for control memories containinginternally used data structures, which have a high priority accesscompared to the picoprocessors. This also allows debugging of the chip,since the picoprocessors can read the contents of the control memories.

[0276] The arbiter supports read access, write access andread-add-write, whereby an N-bit integer is added to the contents of thememory in an atomic operation.

[0277] A general address scheme is used to access the M memories, suchthat the physical location of an object in the memory is transparent.

[0278] The concept of trees as used by the Tree Search Engine to storeand retrieve information. Retrieval, i.e., tree-searches and alsoinserts and deletes are done based on a Key, which is a bit-patternlike, for example, a MAC source address, or the concatenation of an IPsource address and IP destination address. Information is stored in acontrol block called Leaf, which contains at least the Key (as will beseen later, the stored bit pattern is actually the hashed Key). A leafcan also contain additional information, like aging information, or userinformation, which can for example be forwarding information like targetblade and target port numbers.

[0279] There are tree types (FM, LPM and SMT) and associated tree typesearches, namely: fixed match, software managed tree and largest prefixmatch. An optional additional criterium for checking the leaf during atree search is the VectorMask. Roping, aging and a latch are used toincrease search performance.

[0280] The search algorithm for FM trees is shown in FIG. 14. The searchalgorithm operates on input parameters, which include the Key, performsa hash on the Key, accesses a Direct Table (DT), walks the tree throughPattern Search Control Blocks (PSCBs) and ends up at a Leaf (FIG. 14).There are three types of trees, each with its own search algorithm,which causes the tree-walk to occur according to different rules. Forexample, for Fixed Match (FM) trees, the datastructure is a PatriciaTree. When a Leaf has been found, this Leaf is the only possiblecandidate that can match the input Key. For Software Managed Trees,there can be multiple Leafs that are chained in a linked list. In thiscase, all Leafs in the chain are checked with the input Key, until amatch has been found or until the chain has been exhausted. A so-called“compare at the end” operation, which compares the input Key with thepattern stored in the Leaf, verifies if the Leaf really matches theinput Key. The result of the search will be OK when the Leaf has beenfound and a match has occurred, or KO in all other cases.

[0281] The input to a search operation consists of the followingparameters:

[0282] Key (128 bits). The Key must be built using special picocodeinstructions prior to the search (or insert/delete). There is only oneKey register. However, after the tree search has started, the Keyregister can be used by the picocode to build the key for the nextsearch, concurrently with the TSE performing the search. This is becausethe TSE bashes the Key and stores the result in an internal HashedKeyregister (thus, in reality, there are 2 Key registers).

[0283] KeyLength (7 bits). This register contains the length of the Keyin bits. It is automatically updated by hardware during building of theKey.

[0284] LUDefindex (8 bits). This is an index into the LUDefTable, whichcontains a full definition of the tree in which the search occurs. TheLUDefTable is described in detail later.

[0285] TSRNr (1 bit). The search results can be stored either in TreeSearch Result Area 0 (TSR0) or TSR1. This is specified by TSRNr. Whilethe TSE is searching, the picocode can access the other TSR to analyzethe results of a previous search.

[0286] VectorIndex (6 bits). For trees which have the VectorMask enabled(which is specified in the LUDefTable), the VectorIndex denotes a bit inthe VectorMask. At the end of the search, the value of this bit isreturned and can be used by picocode.

[0287] The input Key will be hashed into a HashedKey, as shown in FIG.14. There are six fixed hash algorithms available (one “algorithm”performs no hash function). It is specified in the LUDefTable whichalgorithm will be used. A programmable hash function may be used to addflexibility.

[0288] The output of the hash function is always a 128-bit number, whichhas the property that there is a one-to-one correspondence between theoriginal input Key and the output of the hash function. As will beexplained below, this property minimizes the depth of the tree thatstarts after the Direct Table.

[0289] If colors are enabled for the tree, which is the case in theexample of FIG. 14, the 16-bit color register is inserted in the 128-bithash function output. The insertions occurs directly after the DirectTable. I.e., if the Direct Table contains 2^(N) entries, then the 16-bitcolor value is inserted at bit position N, as shown in the figure. Theoutput of the hash function, together with the inserted color value(when enabled), is stored in the HashedKey register.

[0290] The hash function is defined such that most entropy in its outputresides in the highest bits. The N highest bits of the HashedKeyregister are used to calculate an index into the Direct Table (DT).

[0291] The search starts with an access into the Direct Table: a DTEntryis read from the direct table. The address used to read the DTEntry iscalculated from the N highest bits of the HashedKey, as well as ontree-properties as defined in the LUDefTable. This is explained indetail below. The DTEntry can be seen as the root of a tree. Theparticular tree datastructure that is used depends on the tree-type. Atthis point it suffices to say that a Patricia Tree datastructure is usedfor FM trees, and extensions to Patricia Trees for LPM and SMT trees.

[0292] An example of the use of an 8 entry DT is shown in FIG. 15. Itcan be seen that the search time (i.e., the number of PSCBs that must beaccessed) can be reduced by using a DT. Thus, by increasing the DT size,a trade-off can be made between memory usage and search performance.

[0293] As can be seen from FIG. 15, a DTEntry can contain the followinginformation:

[0294] Empty. There are no Leafs attached to this DTEntry.

[0295] A pointer to a Leaf. There is a single Leaf attached to thisDTEntry.

[0296] A pointer to a PSCB. There are more than one Leafs attached tothis DTEntry. The DTEntry defines the root of a tree.

[0297] The Search Algorithm for a software managed tree and algorithmfor generating the tree is set forth in U.S. patent application Ser. No.09/312,148 and is incorporated herein by reference.

[0298] An algorithm termed “Choice Bit Algorithm” uses a certain metricto build a binary search tree based upon bits selected from items termed“rules” in a set or universe of rules. All our examples are couched interms of Internet Protocol (IP) headers, but a fixed format header ofany type could be used instead.

[0299] In IP, each Rule pertains to certain Keys which might be builtwith the following subsections: Source Address (SA), Destination Address(DA), Source Port (SP), Destination Port (DP), and Protocol (P). Thesedata are respectively 32, 32, 16, 16, and 8 bits long and so a Key to betested consists of 104 bits. The Choice Bit Algorithm finds certain ofthe 104 bits which are especially useful. Testing the few bits in effecteliminates all but one or all but a few rules from possible application.For some rules, testing inequalities by means of simple compareoperations are also appropriate. The bit tests and compares arelogically organized in a binary tree. The tree is mapped into a hardwareenabled structure that tests bits at high speeds. Such testing resultsin just one rule or a small number of rules (called a leaf chain) whichthe Key might fit. In the former case, the Key is then tested in full bythe rule. In the latter case, the Key is then tested in a lattice oftests using compares and full rule tests.

[0300] Each rule in the rule set is associated with an action which istaken if the rule is the highest priority rule which fits the key. Rulescan intersect (one key fits two or more rules). In that case, rules canbe given priority numbers 1, 2, 3, . . . , so that any two intersectingrules have different priorities (an administrator must declare whichrule dominates if a key fits two or more). Thus if more than one ruleremains to be tested after the bit tests and compares, the rules aretested in order of priority. A lower priority number designates a rulewith higher priority.

[0301] If no fit is found at all, some default provision may bespecified.

[0302] The search algorithm for the longest Prefix Matching method isset forth in U.S. Pat. No. 5,787,430, incorporated herein by reference.The method requires entering at a node of said database (root node);determining a search path from one node to another through saidtree-like database by successively processing segments of said searchargument which comprise only those parts of the entries which arenecessary to identify the next (child) node, and said second linkinformation until said segments are consumed or a (leaf) node lackingsaid second link information is reached; comparing with said searchargument an entry stored in the node at which said search path ended;and if no at least partial match between the search argument and saidentry is found in said current node, backtracking said search path byprocessing said first link information of said current node; andrepeating the previous two steps until said at least partial match isfound or said root node is reached.

[0303]FIG. 16 shows an embodiment of the main switching fabric device.Preferably, each interface device chip has at least two integratedparallel-to-serial ports which receive parallel data and convert thedata to a high speed serial data stream which is forwarded over a seriallink to the switching fabric device. Data received from switching fabricdevice on a high speed serial link is converted to parallel data byanother DASL. An embodiment of the Serializer/Deserializer termed DataAlign Serial Link (DASL) is described herein.

[0304] At least one DASL interfaces the switching fabric device to theserial links. Data from the serial link is converted into parallel datawhich is delivered to switching fabric device. Likewise, parallel datafrom switching fabric device is converted to serial data which isdelivered to the serial links. The serial links can be aggregated toincrease throughput.

[0305] Still referring to FIG. 16, the switching system includes switchfabric 11, input switch adapters 13 (13-l . . . 13-k) which areconnected to the switch fabric input ports 15 (15-l . . . 15-k), andoutput switch adapters 17 (17-l . . . 17-p) which are connected to theswitch fabric at output ports 19 (19-l . . . 19-p).

[0306] Incoming and outgoing transmission links 21 (21-l . . . 21-q) and23 (23-l . . . 23-r) are connected to the switch system by line (link)adapters 25 (25-l . . . 25-q) and 27 (27-l . . . 27-r), respectively.The transmission links carry circuit switched or packet switched trafficfrom and to attached units such as work stations, telephone sets or thelike (links designated WS), from and to local area networks (linksdesignated LAN), from or to Integrated Services Digital Networkfacilities (links designated ISDN), or from and to any othercommunication systems. Furthermore, processors may be attached directlyto switch adapters 13 and 17. The line adapters (LA) and switch adapters(SA) have a common interface.

[0307] At the input switch adapters, various services from packetswitched and circuit switched interfaces are collected and convertedinto uniform minipackets (having one of several possible fixed lengths),with a header containing routing information designating the requiredoutput port (and outgoing link) of the switch. Some details on theminipacket format and on minipacket generation in the input switchadapters and on depacketization in the output switch adapters will begiven in the next sections.

[0308] The switch fabric routes the minipackets via a fast self-routinginterconnection network from any input port to any output port. Thestructure of the self-routing network is such that minipackets can berouted simultaneously internally without any conflicts.

[0309] The heart of the switching system is the switch fabric. Twodifferent implementations are considered and will be describedseparately. In one implementation, the switch fabric comprises aself-routing binary tree for each input port, connecting the respectiveinput port to all output ports; the switch fabric comprises k such treesin combination (if k input ports are provided). In the otherimplementation, a bus structure with an output RAM is provided as aslice for each output port, connecting all input ports to the respectiveoutput port; the switch fabric comprises p such slices in combination(if p output ports are provided).

[0310] In the drawings and specifications there has been set forth apreferred embodiment of the invention and, although specific terms areused, the description thus given uses terminology in a generic anddescriptive sense only and not for purposes of limitation.

[0311] DASL is described in application Ser. No. 09/330,968, filed Jun.11, 1999 and incorporated herein by reference. The DASL Interfacereceives data from a parallel interface such as a CMOS ASIC, partitionsthe bits from the parallel interface into a smaller number of parallelbit streams. The smaller number of parallel bit streams are thenconverted into a high speed serial stream, which is transported via atransmission medium to the receiver of the other module. A differentialdriver with control impedance drives the serial bit stream of data intothe transmission media.

[0312] DASL implements the method of parsing a data stream presented asN bits in parallel into a plurality of portions each having n bits,wherein n is a fraction of N; serializing each n bit portion of the datastream; transferring each serialized portion over a corresponding one ofa plurality of parallel channels; and deserializing each transferredportion of the data stream to restore the data stream to presentation asN bits in parallel.

[0313] In the drawings and specifications there have been set forthpreferred embodiments of the inventions here disclosed and, althoughspecific terms are used, the description thus given uses terminology ina generic and descriptive sense only and not for purposes of limitation.

What is claimed is:
 1. Apparatus comprising: a control point processor;an interface device operatively connected to said control pointprocessor and having: a semiconductor substrate; a plurality ofinterface processors formed on said substrate, the number of saidprocessors being at least five; internal instruction memory formed onsaid substrate and storing instructions accessibly to said interfaceprocessors; internal data memory formed on said substrate and storingdata passing through said device accessibly to said interfaceprocessors; and a plurality of input/output ports formed on saidsubstrate; at least one of said input/output ports connecting saidinternal data memory with external data memory; at least two other ofsaid input/output ports exchanging data passing through the interfacedevice with an external network under the direction of said interfaceprocessors; said control point processor cooperating with said interfacedevice by loading into said instruction memory instructions to beexecuted by said interface processors in directing the exchange of databetween said data exchange input/output ports and the flow of datathrough said data memory; and a self routing switching fabric deviceoperatively connected to said interface device and directing datainbound to the apparatus from identifiable addresses to flow outboundfrom the apparatus to identified addresses.
 2. Apparatus according toclaim 1 further comprising: a second interface device operativelyconnected to said control point processor and having: a semiconductorsubstrate; a plurality of interface processors formed on said substrate,the number of said processors being at least five; internal instructionmemory formed on said substrate and storing instructions accessibly tosaid interface processors; internal data memory formed on said substrateand storing data passing through said device accessibly to saidinterface processors; and a plurality of input/output ports formed onsaid substrate; at least one of said input/output ports connecting saidinternal data memory with external data memory; at least two other ofsaid input/output ports exchanging data passing through the interfacedevice with an external network under the direction of said interfaceprocessors; said control point processor cooperating with said secondinterface device by loading into said instruction memory instructions tobe executed by said interface processors in directing the exchange ofdata between said data exchange input/output ports and the flow of datathrough said data memory; said second interface device being operativelyconnected to said self routing switching fabric device; and saidinterface device and said second interface device being linked one tothe other and cooperating so that data flow through the apparatus isdivided therebetween.
 3. Apparatus according to claim 2 furthercomprising a second self routing switching fabric device operativelyconnected to said interface devices and serving as a secondary device todirect data inbound to the apparatus from identifiable addresses to flowoutbound from the apparatus to identified addresses in the event offailure of said switching fabric device.
 4. Apparatus comprising: a selfrouting switching fabric device directing data inbound to the apparatusfrom identifiable addresses to flow outbound from the apparatus toidentified addresses, said switching fabric device having an input portand an output port for data flow therethrough; a control pointprocessor; and a plurality of interface devices operatively connected tosaid switching fabric device and to said control point processor, eachof said interface devices having: a semiconductor substrate; a pluralityof interface processors formed on said substrate, the number of saidprocessors being at least five; internal instruction memory formed onsaid substrate and storing instructions accessibly to said interfaceprocessors; internal data memory formed on said substrate and storingdata passing through said device accessibly to said interfaceprocessors; and a plurality of input/output ports formed on saidsubstrate; at least one of said input/output ports connecting saidinternal data memory with external data memory; at least two other ofsaid input/output ports exchanging data passing through the interfacedevice with an external network under the direction of said interfaceprocessors; said control point processor cooperating with said interfacedevice by loading into said instruction memory instructions to beexecuted by said interface processors in directing the exchange of databetween said data exchange input/output ports and the flow of datathrough said data memory; and a plurality of said interface devicesbeing operatively connected with each of the input and output ports ofsaid switching fabric device.
 5. Apparatus according to claim 4 whereindata flowing into said apparatus is passed through an interface deviceinbound to said switching fabric device.
 6. Apparatus according to claim4 wherein data flowing from said apparatus is passed through aninterface device outbound from said switching fabric device. 7.Apparatus according to claim 4 data flowing into said apparatus ispassed through an interface device inbound to said switching fabricdevice and data flowing from said apparatus is passed through aninterface device outbound from said switching fabric device. 8.Apparatus comprising: a self routing switching fabric device directingdata inbound to the apparatus from identifiable addresses to flowoutbound from the apparatus to identified addresses, said switchingfabric device having an input port and an output port for data flowtherethrough; a control point processor; and an interface deviceoperatively connected to said switching fabric device and to saidcontrol point processor, said interface device having: a semiconductorsubstrate; a plurality of interface processors formed on said substrate,the number of said processors being at least five; internal instructionmemory formed on said substrate and storing instructions accessibly tosaid interface processors; internal data memory formed on said substrateand storing data passing through said device accessibly to saidinterface processors; and a plurality of input/output ports formed onsaid substrate; at least one of said input/output ports connecting saidinternal data memory with external data memory; at least two other ofsaid input/output ports exchanging data passing through the interfacedevice with an external network under the direction of said interfaceprocessors; at least two other of said input/output ports accomplishinga high speed interconnection between said interface device and saidswitching fabric device; said control point processor cooperating withsaid interface device by loading into said instruction memoryinstructions to be executed by said interface processors in directingthe exchange of data between said data exchange input/output ports andthe flow of data through said data memory.
 9. Apparatus according toclaim 8 wherein each of said high speed interconnection input/outputports has: a logic circuit which parses a data stream presented as Nbits in parallel into a plurality of portions each having n bits, wheren is a fraction of N; a serializer communicating with said logic circuitwhich serializes each parsed portion of the data stream; a plurality ofchannels communicating with said serializer and transferring datastreams, the number of said channels being equal to the number of parsedportions and one of said channels being associated with each of saidparsed portions of the data stream; and a deserializer communicatingwith said channels and which receives serial data streams transferredtherethrough and restores the data stream to presentation as N bits inparallel.
 10. Apparatus comprising: a control point processor; aninterface device operatively connected to said control point processorand having: a semiconductor substrate; a plurality of interfaceprocessors formed on said substrate, the number of said processors beingat least five; internal instruction memory formed on said substrate andstoring instructions accessibly to said interface processors; internaldata memory formed on said substrate and storing data passing throughsaid device accessibly to said interface processors; and a plurality ofinput/output ports formed on said substrate; at least one of saidinput/output ports connecting said internal data memory with externaldata memory; at least two other of said input/output ports exchangingdata passing through the interface device with an external network underthe direction of said interface processors; said control point processorcooperating with said interface device by loading into said instructionmemory instructions to be executed by said interface processors indirecting the exchange of data between said data exchange input/outputports and the flow of data through said data memory; and said interfacedevice having two high speed interconnection ports, each of which has: alogic circuit which parses a data stream presented as N bits in parallelinto a plurality of portions each having n bits, where n is a fractionof N; a serializer communicating with said logic circuit whichserializes each parsed portion of the data stream; a plurality ofchannels communicating with said serializer and transferring datastreams, the number of said channels being equal to the number of parsedportions and one of said channels being associated with each of saidparsed portions of the data stream; and a deserializer communicatingwith said channels and which receives serial data streams transferredtherethrough and restores the data stream to presentation as N bits inparallel.
 11. Apparatus according to claim 10 wherein each of said highspeed interconnection ports is connected to the other of said high speedinterconnection ports whereby data flow through said interface devicepasses from one interface processor thereof through said interconnectionports to another of said interface processors.
 12. Apparatus accordingto claim 10 further comprising a self routing switching fabric deviceoperatively directing data inbound to the apparatus from identifiableaddresses to flow outbound from the apparatus to identified addressesand wherein each of said high speed interconnection ports is connectedto said switching fabric device whereby data flow through the apparatuspasses inbound from one interface processor to said switching fabricdevice and outbound from said switching fabric device through anotherinterface processor.
 13. A method comprising the steps of: receiving adata flow inbound through an input port of an interface device;communicating the inbound data flow through a plurality of interfaceprocessors embedded in the interface device; dividing the inbound dataflow into a first portion to be redirected by a switching fabric deviceand a second portion to be temporarily stored apart from the switchingfabric device; directing the first portion to a switching fabric deviceand the second portion to a memory element; receiving the first portionat an interface processor as a data flow outbound from the switchingfabric device; recombining the first and second portions; and directingthe recombined data flow outbound through an output port in accordancewith the execution of the instructions by the interface processors. 14.A method according to claim 10 wherein the step of communicating thedata flow through the plurality of interface processors comprisesparsing the data flow into portions and distributing the parsed portionsamong the plurality of interface processors for handling in parallel.